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Abstract 

A software tool, computing observed and expected upper limits on poissonian process rates using 
a hybrid frequentist-bayesian CLg method, is presented. This tool can be used for simple counting 
experiments where only signal, background and observed yields are provided or for multi-bin ex¬ 
periments where binned distributions of discriminating variables are provided. It allows to combine 
several channels and takes into account statistical and systematic uncertainties, as well as correlations 
of systematic uncertainties between channels. It has been validated against other software tools or 
analytical calculations, in several cases. 
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1 Introduction 


In a physics experiment, the result must sometimes be interpreted in term of upper limits on the rate 
of a particular physical process of interest, dubbed signal. OpTHyLiC offers an easy-to-use hybrid 
frequentist-bayesian solution to set upper limits for an arbitrary number of channels and backgrounds, 
using the CLg method. It can be used for simple counting experiments where only signal, background 
and observed yields are provided as well as for multi-bin experiments where binned distributions of 
discriminating variables are provided. Statistical and systematic uncertainties are taken into account in a 
bayesian way, and correlations of systematic uncertainties accross backgrounds and channels are properly 
accounted for. OpTHyLiC can be downloaded from [1]. 

Other tools such as McLimit [2] or RooStats [3] provide the ability to compute limits using the 
hybrid method. However, OpTHyLiC was specifically optimized for this method: it is light, simple and 
fast. Furthermore, it provides several options to configure the treatment of statistical and systematic 
uncertainties. 

The document is organized as follows. The notations used in this paper are introduced in Sec. 2. The 
statistical method implemented in OpTHyLiC is described in Sec. 3. The software is described in Sec. 4 
and its validation in Sec. 5. 

2 Notations and definitions 

The following notations will be used throughout the document: 

• /i: signal strength, defined as the actual signal rate divided by a hxed reference signal rate value 
(such as obtained from a theoretical prediction); 

• /iup: upper limit on the signal strength; 

• n: number of channels; 

• Sci- event yield for signal process in channel c (c € [Ijn-]) and bin /; 

• 5”°™: nominal event yield for signal process in channel c and bin /; 

• a cl- absolute statistical uncertainty for signal process in channel c and bin 1; 

• bail', event yield for type i background process in channel c and bin Z; 

• nominal event yield for type i background process in channel c and bin Z; 

• total nominal yield for background processes in channel c and bin Z; 

i G backgrounds 

• acii- absolute statistical uncertainty for type i background process in channel c and bin Z; 

• Nci: event yield in channel c and bin Z; 

• event yield actually observed in the data or the pseudo-data in channel c and bin Z. 

A channel corresponds to a region enriched in signal events (also called signal region). In the case 
of multi-bin experiments, the user must provide binned distributions (or histograms) for each channel, 
background and signal. Bins of these distributions are referred to by the index Z. For counting experi¬ 
ments, only yields for each background and signal in each channel are needed. In this case, the index Z is 
irrelevant and is discarded. 
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3 Method description 


OpTHyLiC implements the frequentist CLs method [4]. Pseudo-experiments are generated and the 
distribution of a test statistic is determined under both signal plus background and background only 
hypotheses. From these distributions, CLg is computed and the upper signal strength is found by 
solving the equation 

CL, (^) = a, (1) 


for a fixed value of a which is set depending on the chosen confidence level 1 — a. For 95% confidence level 
upper limits, a = 0.05. The test statistic used is the ratio of likelihoods under signal plus background 
and background only hypotheses: 


= -2 In 


C{p) 


( 2 ) 


The general form of the likelihood is discussed in Sec. 3.1. The statistical and systematic uncertainties 
are included by the use of nuisance parameters, as detailed in Sec. 3.2 and 3.3, respectively. These 
uncertainties are accounted for in a bayesian way: the inference of the observed upper limit is performed 
from pseudo-experiments using the likelihood marginalised over the nuisance parameters, as explained in 
Sec. 3.4. The procedure for calculating the expected limits is summarised in Sec. 3.5. 


3.1 Statistical model 

The full likelihood, including all nuisance parameters, is given by: 

(/rscz -I- bci ] 


C,/ _ 

where: 


NJ 


o-ilJ-Soi+bci) 


f (4; ^ci) n / &cr. n 9 ivj) , 


(3) 


• the index c runs over the channels; 

• the index i runs over the backgrounds; 

• the index I runs over the bins; 

• the index j runs over the systematic uncertainties; 

• Sci = s'^i X ‘ {{Vj})-, 

,b,,= ^ 6,,,= ^ 

zGbackgrounds iGbackgrounds 

The set of nuisance parameters 5^;,?7j} can be divided into two categories. The first ones, 
WcU^'cu}^ account for the statistical uncertainties, arising from the finite size of samples used to estimate 
the signal and background nominal yields and 5”°™’ respectively, are constrained by the functions 
/. The second ones, account for the systematic uncertainties, and are constrained by the functions 
g. The variation of the signal and background yields under the effect of the systematic uncertainties are 
described by the functions {{Vj}) ^-cid respectively. 

Bins of discriminating variable histograms are considered independent, as channels. It is therefore 
equivalent to run a counting experiment with N channels and a binned experiment with N bins where 
the yields and uncertainties in each bin match those in the channels. 


3.2 Treatment of statistical uncertainties 

The nuisance parameters and 6F, accounting for the statistical uncertainties on the nominal signal and 
background yields, are constrained by probability density functions (p. d. f.) / of the form: 

(4) 
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where y is the nuisance parameter, the nominal yield and a the statistical uncertainty, which can be 
the square root of summed squared weights (e.g. when is estimated from a mixture of normalised 
simulated samples). 

In OpTHyLiC, five different p. d. f. can be used as constraints: a normal distribution, a log-normal 
distribution, or three different types of gamma distributions. 


3.2.1 Normal and log-normal constraints 


The parameters of the normal and log-normal p. d. f. are chosen so that the average and the standard 
deviation of these distributions are equal to and cr, respectively. The normal distribution has the 
form: 


/N(2/;2/““,a) = ^expi- 

(TV 27r 


(y - 

2 cr 2 


and the log-normal distribution has the form: 


1 


/l( 2/; 2/"°“, cr) = exp ( - 

y by Ztt 


(Iny - af 

2P 


with 


= ln 


/ (ynom)2 \ 

and b = . 

V \/( 2 /"°“)" + 



In 1 + 


(y“°“) 


(5) 


( 6 ) 


(7) 


The normal distribution can be defined for any real y values, including non-physical negative event 
yields. On the contrary, the log-normal distribution is defined only for y > 0. 


3.2.2 Gamma constraints 

The three gamma p. d. f. have the general form: 


fG{y,a,b) 


a{ay)^~'^ e-‘^y 

TW 


( 8 ) 


where a and b are the rate and shape parameters respectively. As for the log-normal, the gamma 
distributions are defined for any strictly positive value or y. 

This p. d. f. can be seen as the posterior distribution obtained from an auxiliary measurement account¬ 
ing for the poissonian nature of the statistical uncertainty. Accounting for this nature is not straightfor¬ 
ward since events are in general weighted. A popular approach (used for example in HistFactory [5]) is 
to consider an imaginary auxiliary measurement in which all events have unit weight (which can therefore 
be described by a Poisson distribution) and in which the relative statistical uncertainty is equal to that 
used in the main measurement (cr/y"°™). 

Let Naux be the number of events observed in this auxiliary measurement. The auxiliary measurement 
likelihood is: 


P{N^u.; A) 


r(iVaux + i)® 


(9) 


where A is the (unknown) nuisance parameter. It can be written as: 


A = 7A^ 


nom 
aux 5 


( 10 ) 


where is the nominal value of Naux and 7 the nuisance parameter affecting the yield in the main 

measurement - the product of Poisson terms in Eq. 3 - in a multiplicative way: y = 7 ?/"°™. is found 

by imposing that the relative statistical uncertainty is the same in the auxiliary and main measurements: 


jy-nom _ 

'^aux 


( 11 ) 
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The auxiliary measurement likelihood can then be written as follows: 


-P(^aux;7) 


(7(y„om/^)2)'^' 

r (fVaux + 1) 




(12) 


From this likelihood, posterior distributions for 7 can be determined for various prior distributions tt ( 7 ): 


ail) 


P(iVaux = iVa"uT;7)^(7) 

J P(iVa„,, = iVruT;7)7r(7)d7 


(13) 


These posterior distributions can then be translated into posterior distributions for the yield y (hereafter 
denoted as f {y; , a), using the same notation as in Eq. 4). 

Three prior distributions are considered; in each case, the posterior distribution for j/ is a gamma 
distribution with the general form given by Eq. 8: 

• TT ( 7 ) oc 1 (uniform prior); in this case, the posterior distribution is: 

f{y; a) = /g (j/; a = b = {y^°^/af + l) ; (14a) 

• TT ( 7 ) OC s/l (Jeffreys prior^); in this case, the posterior distribution is: 

/(j/; a) = /g {y; a = b = (y““/a)" + I/ 2 ) ; (14b) 

• TT ( 7 ) OC 1/7 (hyperbolic prior); in this case, the posterior distribution is: 

fiy; 2 /"°“, a) = /g (y; a = & = {y^°^/af) . (14c) 


The three gamma constraints available in OpTHyLiC correspond to the three posteriors given in 
Eq. 14a, 14b and 14c. They are summarized in Tab. 1, where expectation and standard deviation for y 
are also given (we recall that for the gamma p. d. f. in Eq. 8 , one has E [y] = b/a and var [y] = b/a^). 
As can be seen from this table, the expectation and standard deviation are equal to the nominal yield 
and statistical uncertainty only for the hyperbolic prior tt ( 7 ) oc I/ 7 , as for the normal and log-normal 
constraints. For the gamma constraints with the uniform and Jeffreys priors, it is not the case, but the 
difference vanishes in the asymptotic limit. 


prior 

posterior parameters 

E[y] 

var [y] 

a 

b 

TT ( 7 ) OC 1 

ynoral^2 

(ynom/g,)2 

ynom + 

(^l + cr 2 /(yn°“) 2 ^ 

7 r( 7 ) oc 1 /^ 

ynoral^2 

(ynom/^)2 ^ 

ynora ^cr'^lynom 

(l + iaV(y--)2) 

TT ( 7 ) OC 1/7 

ynoral^2 

{y^°^/af 

yiiom 



Table 1: Summary of the three gamma constraints available in OpTHyLiC. The constants a and b are 
parameters of the posterior distribution which general form is given in Eq. 8 . 


3.2.3 Comparison of the constraint functions 

The five available constraint functions - normal, log-normal and the three gamma distributions - are 
compared for three different values of y"°™ and ct in Fig. 1. When a is small with respect to y''°™, the 
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Figure 1: Comparison of normal, log-normal and gamma probability density functions used for statistical 
uncertainties for three values of 2 /"°™ and a. 


distributions are very close to each other. Otherwise, the differences between the distributions can be 
significant. 

Care should be taken when statistical uncertainties are large or when nominal yields are equal to zero. 
When statistical uncertainties are large, constraint p. d. f. can be truncated at zero in the normal case. 
In such cases, log-normal and gamma should be preferred. When nominal yields are equal to zero, log¬ 
normal and gamma are undefined. In such cases, OpTHyLiC automatically selects a normal constraint 
truncated at zero. 


3.3 Treatment of systematic uncertainties 


Systematic uncertainties on nominal yields and are accounted for by including the set of nui¬ 
sance parameters {? 7 j}. They are assumed to be either 100% correlated or completely uncorrelated. The 
total number of nuisance parameters is therefore equal to the total number of independent systematic un¬ 
certainties, including all channels, backgrounds and signals. The correlation factor between two nuisance 
parameters r\j and rik is given by the Kronecker symbol Sjk- The term constraining nuisance parameters 
in the likelihood can thus be factorized into the product of individual constraint terms. In Eq. 3, the 
nuisance parameter for each systematic uncertainty of index j is constrained by a standard normal p. d. f. 
g of the form: 



(15) 


As stated in Sec. 3.1, the effect of systematic uncertainties on the yield is described by relations of 
the form: 

y = (16) 


where y is the varied yield, the nominal yield and {{Vj}) th® function describing the variation 
of the yield with the set of nuisance parameters {rij}. OpTHyLiC provides two different solutions for 
combining the effect of multiple nuisance parameters: 


• additive: 


E [^^"^‘(^y)-l]; (17a) 

jGsystematics 


• multiplicative: 


j Gsystematics 


^We recall that Jeffreys prior is tt (7) oc ^7(7), where the Fisher information is 7(7) = E K 


Eq. 12 in this expression leads to tt (7) oc l/-^/7. 


^ aiogP(iVaux;7) ^ 
d'y ^ 


(17b) 


. Injecting 
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In Eq. 17a and 17b, (rjj) is the function describing the variation of the yield with nuisance parameter 
j. In both cases, when the effect of only one nuisance parameter is considered, {rjj) = {rjj) as 
it should. 

For each systematic uncertainty j, the corresponding nuisance parameter rij is chosen such that r]j = 0 
corresponds to no variation, rjj = +1 to a +la variation, and rjj = —1 to a —Icr variation. The main 
problematic associated to systematic uncertainties is the choice of the function (jjj) that relates the 
effect of the systematic uncertainty to its associated nuisance parameter. Usually, hY^^ is known, for 
each systematic, for rjj = 0, —1 and +1. The value for rjj = 0 is by definition equal to 1: it corresponds 
to the case y = Let hj (hj) be the relative variation of the yield when systematic j is varied by 

+1 (—1) cr. Thus, for all j: 

/if"* {rjj = 0 ) = 1 ; (18a) 

h] = {rjj = + 1 ) - 1 ; (18b) 

hj = hY^Yv,=-^)-l- (18c) 

The problem consists in finding continuous functions hY^^ {Vj)j interpolating for rjj € [—1,+1] and 
extrapolating for rjj > +1 and rjj < —1, and such that equations 18 are satisfied - at least approximately. 
Four choices are currently available in OpTHyLiC, and are represented in Fig. 2: 

• piece-wise linear interpolation and extrapolation, defined in Sec. 3.3.1; 

• piece-wise exponential interpolation and extrapolation, defined in Sec. 3.3.2; 

• polynomial interpolation and exponential extrapolation, defined in Sec. 3.3.3; 

• “McLimit” interpolation and extrapolation, defined in Sec. 3.3.4 and corresponding to the choice 
followed in the McLimit program. 


3.3.1 Piece-wise linear interpolation and extrapolation 

The piece-wise linear interpolation and extrapolation is defined as follows: 




{ 1 + rjjhj if rjj is positive; 
1 — rjjhj if rjj is negative. 


(19) 


From this definition, it appears that /if"* can be negative if hj or /ij is negative. This unphysical 
behaviour is dealt with in OpTHyLiC by setting /i®^®* to zero when it occurs. 


3.3.2 Piece-wise exponential interpolation and extrapolation 

The piece-wise exponential interpolation and extrapolation is defined as follows: 


hY^\vY 



if rjj is positive, 
if rjj is negative. 


( 20 ) 


From this definition, it appears that /if"* can be negative if /ij or /ij is lower than -1. This unphysical 
behaviour is dealt with in OpTHyLiC by applying a linear interpolation and extrapolation instead (see 
Sec. 3.3.1) when it occurs. If both /ij and /ij are greater than -1, /i®^®* is positive for all rjj values. 
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hj = -0.03, h] = 0.5 h| = -0.5, h] = 0.5 





Figure 2: Illustration of interpolation and extrapolation functions implemented in OpTHyLiC for /ij = 
—0.03 and /ij = 0.5 (top left), hj = —0.5 and /ij = 0.5 (top right), hj = 0.2 and /ij = —0.5 (bottom left) 
and hj = —0.5 and hj = —0.5 (bottom right). 


3.3.3 Polynomial interpolation and exponential extrapolation 

The polynomial interpolation and exponential extrapolation is defined as follows: 


(i+d) 




Vj 


if r]j > 1 


1 + I] a,T]^j if -1 < Vj < 1> 


2=1 


( 21 ) 


(l + hj'j ' if r]j < -1, 

where the coefficients are chosen such that (rjj ) and its first and second derivatives are continuous 
\Vj\ = 1 rjj = 0. 


3.3.4 “McLimit” interpolation and extrapolation 

The “McLimit” interpolation and extrapolation is defined as follows: 

{ 1 + B if B > 0 , 

( 22 ) 

if B < 0, 


where 


B = 


r]jhj {1 — R) + RQ if rjj is positive, 
-rjjhj (1 — i?) + RQ if rjj is negative. 


(23) 








































































































and 


1 


Q = r]j- 


h] - h\ 


T 




and R = 


1 + 3|?7j| 


(24) 


This definition ensures that the first derivative of at rjj = 0 is continuous and that > 0 for all 
rjj values. It should be noted that equation 18b (18c) is exactly satisfied when hj > 0 (hj > 0) but only 
to first order when hj < 0 (hj < 0). Indeed, the following relations hold: 


hf = +1) 



A 

0 


if /ij > 0, 


A 

0 

+ hj 

if hj > 0. 


(25) 


(26) 


It should also be noted that, when the effect of the considered uncertainty is symmetric (hj = —hj), this 
interpolation/extrapolation scheme is equivalent to the piece-wise linear one (see Sec. 3.3.1) for r]j > 0 if 
> 0 or rjj < 0 if /if > 0. 


3.4 Inference of observed npper limit /tup 

The observed upper limit on the signal strength /tup is derived from the CLs method using (Eq. 2) as 
test statistic, is computed using the nominal likelihood: 

C{ji) = C{n, {4} = {s-ri, M = 0). (27) 

It thus reduces to the following simple form: 


9m = 

C,l 


(28) 


where q'^ is the test for channel c and bin I given by: 




= 2 




-fV,ln^ 


Lnom 

^cl 


Lnom 

^cl 


(29) 


The distributions of q^ under signal plus background and background only hypotheses (hereafter 
denoted as p(g^|/i) and p(g^|0) respectively) are determined by generating pseudo-experiments from the 
marginal likelihood: 

(li) = f c{ji, {s'^lX^l,^l3}) n n n (so) 

j c,l i 

In practice, this is done by first generating nuisance parameter values from their constraint p. d. f. 
and by then generating Nd using the nuisance parameter values obtained in the first step. Typical 
distributions produced by OpTHyLiC are shown in Fig. 3. 

Once p{q^\ji) and p{q^\0) have been determined, the CLs is computed by using: 


CLs (jx) 


CLs+b 

CLb 


oo 

oo ’ 

E p(<?A*|o) 

9m =9^*’= 


(31) 


where is the observed value of the test (g°'^® = q^ ({/Vc;} = {/V°;*’®})). The upper limit is found by 
searching p such that CLs{p) is equal to a (see Eq. 1). 
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Figure 3: Example of distributions of under signal+background (/r' = /r) and background only hy¬ 
potheses (/i' = 0). 


3.5 Computation of expected limits 

OpTHyLiC can also be used to compute expected limits on the signal strength, under background only 
hypothesis. Five expected limit quantiles are available: median, —2a, —Icr, -|-lcr and +2a. The last 
four ones are defined using the standard normal distribution. Denoting these quantiles by Z and the 
corresponding probability by p, one has: 

Z = $-1 (p), (32) 

where <i> is the cumulative distribution function of the standard normal distribution. The values of 
probabilities for the quantiles available in OpTHyLiC are given in Tab. 2. 


z 

-2 

-1 

0 (median) 

-bl 

-b2 

p 

0.0228 

0.1587 

0.5 

0.8413 

0.9772 


Table 2: Values of probabilities associated to the quantiles available in OpTHyLiC for the calculation 
of expected limits. 


Two methods are provided to compute expected limits. In the first one, the distribution of pup is 
determined by generating pseudo-experiments under background only hypothesis. Expected limits are 
then given by the quantiles (as dehned above) of this distribution. In the second one, expected limits are 
calculated in the same way as observed ones (see Sec. 3.4) replacing the observed CLg by median, —2cr, 
— Icr, -l-lcr and -|-2cr quantiles of the CLg distribution under background only hypothesis. 

Both methods are equivalent from the statistical point of view but different in terms of implementation 
and computing time. In most cases, the second method performs much faster than the first one and should 
therefore be preferred. The first method can be used for cross-checks or if the distribution of is needed. 

It should be noted that, when using the second method, quantiles are not computed from a single 
distribution as in the first method: each quantile is computed in a separate job. Therefore, due to 
statistical fluctuations, the expected values may not be sorted as they should (for example, the —2cr 
expected value can be higher than the — la expected value). In such cases it is recommended to increase 
the number of pseudo-experiments. 
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4 Software description 

4.1 Structure and prerequisites 

OpTHyLiC is written in C++ and uses the ROOT library [6]. The code is separated in different 
classes. Their definitions and implementations are separated into different files which are placed in the 
main OpTHyLiC directory. Once compiled, the libraries can either be used dynamically by using a macro 
interpreted with the ROOT interpreter (CINT and CLING for version 5 and 6 of ROOT, respectively), 
or using a proper C++ program compiled into an executable with a software like gcc. Two simple 
examples, working for both cases, are available in the examples sub-directory. 

As the statistical method relies on pseudo-experiments, pseudo-random number generators are used. 
ROOT provides in its TRandomS class an implementation of the MT19937 Mersenne twister [7]. The 
CH—h standard library in its recent C-|—hll standard [8] also provides various pseudo-random number 
engines, allowing to generate numbers distributed with the various probability distributions mentioned in 
this paper. Therefore, if the available compiler or ROOT version makes the use of the C++11 standard 
possible, the user has the possibility to chose these pseudo-random engines instead of the TRandomS class. 
Thanks to relevant preprocessor directives, the portions of the code specific to the C++11 standard are 
ignored, either by the interpreter or by the compiler, when its use is not possible. 

If used as a compiled executable, the examples have been tested to be fully functional with gcc version 

4.8.2 (4.3.0) or newer, with ROOT version 5.28/OOb, when C++1I features are (are not) used. If used 
as interpreted macros, the examples have been tested to be fully functional with ROOT versions as old 
as 6.02/03 (5.28/OOb), when C++II features are (are not) used. Older versions of gcc and ROOT may 
also be used but have not been tested. Other compilers than gcc could also have been chosen, but have 
not been considered in the installation procedure described below. 

The usage instructions given below corresponds to version 2.00 of OpTHyLiC. 

4.2 Installation and examples 

OpTHyLiC is installed in three steps: 

1. running the INSTALL script to prepare the compilation by creating a Makefile which structure 
depends on how OpTHyLiC is intended to be used; 

2. running make to compile the shared libraries according to the Makefile; 

3. running the setup. [c] sh script created in the first step, to update the LD_LIBRARY_PATH environ¬ 
ment variable with the directory where the shared libraries are available. 

The INSTALL shell script is in the main directory. Using INSTALL —help (or INSTALL -h) displays an 
help detailing the different options, which allows the user to configure the Makefile: 

• —executable (or -e) to compile executables with gcc; if this option is not used, two ROOT 
macros, Compile.C and examples/load.C, for the compilation and needed for the compilation and 
for loading the shared libraries, are created in addition to the Makefile; 

• —C++11 (or -C) to enable the C++11 features; in this case, the script checks if the gcc or ROOT 
versions are recent enough to do so; 

• —permissive to override the check of the gcc or ROOT versions when using the previous option. 

If no option is used, the libraries will be compiled by ROOT to be used with an interpreted macro, 
without the C++II features. Once installed, OpTHyLiC can be used from any location when opening 
a new shell, by running the script setup . [c] sh. 

Two examples of programs using OpTHyLiC, runLimits.C and runSignificance.C, are provided 
in the examiples sub-directory, together with examples of input files "inputl.dat", "input2.dat", 
"inputHistos.dat" and "inputHistosAlternativeSyntax.dat", the syntax of which is detailed 
in Sec. 4.3 ("inputl.dat" and "input2.dat" illustrate the syntax for counting experiments and 
"inputHistos.dat" and "inputHistosAlternativeSyntax.dat" for multi-bin experiments). The ex¬ 
ample runLimits. C illustrates the calculation of expected and observed limits, while runSignif icance. C 
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shows how to plot the test statistic distribution under the background only and signal plus background 
hypotheses and calculate the observed and expected p-values and significances. The syntax of the main 
functions used in both examples is explained in Sec. 4.4. 

These example programs can be run using the following syntax: 

• root -1 load.C ’runLimits.C("inputl.dat", "input2.dat") ’ when used as a ROOT macro; 

• ./runLimits.exe —files inputl.dat input2.dat when used as an executable. 

Here, files for counting experiments are provided. Files for multi-bin experiments are provided using 
exactly the same syntax (OpTHyLiC automatically recognises what type of input files are provided by 
the user and applies the right treatment in each case). All input files don’t have to be of the same type. 
For example, it is possible to provide one file using the counting experiment syntax and another file using 
the multi-bin experiment syntax : 

• root -1 load.C 'runLimits.C("inputl.dat","inputHistosAlternativeSyntax.dat")’ ; 

• ./runLimits.exe —files inputl.dat inputHistosAlternativeSyntax.dat 

In these examples, preprocessor directives are used to allow their use in both modes, or to enable the 
CH—hll features. In the first mode, these examples can also be run from any location, if the load. C macro 
written to load the shared libraries is run before. In the second mode, the executable can be moved to 
any location; furthermore, any program located in the examples sub-directory would be compiled when 
running the make command, as long as its name is of the form run*. C. 

4.3 Input file format 

The input file format depends on the type of experiment. The format for counting experiments is described 
in Sec. 4.3.1 and the one for multi-bin experiments is described in Sec. 4.3.2. In both cases, users must 
create one input file per channel. 

4.3.1 Counting experiment 

The general structure of the files is shown if Fig. 4. 

+sig <name> <yield> <stat> 

.syst <name> <up> <down> 

.syst <nEmie> <up> <down> 

+bg <name> <yield> <stat> 

.syst <ncmie> <up> <down> 

.syst <ncmie> <up> <down> 

+bg <name> <yield> <stat> 

.syst <ncmie> <up> <down> 

.syst <ncmie> <up> <down> 

# . . . 

# add as many backgrounds as needed 

# . . . 

+data <yield> 

Figure 4: General structure of OpTHyLiC’s input files for counting experiments. 

The block starting with the +sig tag defines the signal sample. Blocks starting with the +bg tag define 
the background samples. Finally, the observation is defined with the +data tag. Systematic uncertainties 
are declared using the . syst tag. Fields <name> define the names for signal, backgrounds and systematic 
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uncertainties. Fields <yield> define the nominal yields for signal and backgrounds and the observed yield 
for data. Fields <stat> define the absolute statistical uncertainty for signal and backgrounds. Fields <up> 
and <down> define h} and for each systematic uncertainty. When systematics for different samples 
(signal or backgrounds) or different channels have the same name they are supposed 100 % correlated 
(i.e. they are described by a unique nuisance parameter ijj). Otherwise they are treated as uncorrelated. 
Lines starting with # are not interpreted. The order in which the signal, background and data block 
appear does not matter. 

LaTeX tables summarizing yields and uncertainties can be produced (see Sec. 4.4.2 for instructions 
on how to do this). By default, the sample and systematic names used in these tables are the ones 
specified in the input files in the <name> fields. The user can change background sample names by 
adding .namieLaTeX tags in background sample blocks. Systematic names can be changed by creating 
a dictionnary file mapping names as specified in the <naine> fields to names used for LaTeX tables. 
Channel names used for LaTeX tables can also be specified by adding +nameLaTeX tags in the input 
files. 

Fig. 5 and 6 show an example of what could be the two files in an analysis combining two channels 
and the dictionnary file defining systematic names for LaTeX tables. 

+nameLaTeX $e\mu$ +nameLaTeX $\mu\mu$ 


+bg Bkgl 0.8 0.1 
.nameLaTeX $t\bar{t}$ 
.syst Systl -0.05 0.12 
.syst Syst2 0.04 -0.04 

+sig Sig 2.5 0.6 
.syst Systl 0.21 -0.13 

+data 1 


+bg Bkg2 2.3 0.4 
.nameLaTeX $t\bar{t}+W/Z$ 
.syst Syst3 0.01 0.01 

+sig Sig 2.8 1.1 
.syst Systl 0.05 -0.13 
.syst Syst4 -0.02 -0.09 

+data 3 


Figure 5: Example of input text files for a two channels combination. 


Systl JES 

Syst2 $b$-tagging 

Syst3 $E_{T}~\text{miss}$ 

Syst4 norm 

Figure 6: Example of dictionnary file defining systematic names for LaTeX tables. 

In this example, all backgrounds and signals have non-zero statistical uncertainty. The total number 
of systematic uncertainties (and thus of nuisance parameters r/j) is 4. The one called Systl in the input 
files (JES in the LaTeX tables) affects the background yield of the first channel and signal yields of the 
two channels. LaTeX tables produced by OpTHyLiC for this example are shown in Tab. 3, 4, 5 and 6, 
including captions that are generated automatically. 
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Sample 

efi 

fifj, 

tt 

0.80±0.101[!;J^ 

— 

ti + W/Z 

— 

2.30 ± 0.40t^;(Jo 

Total bkg. 

0.80±0.10l°;J° 

2.30 ± 0.40t°;°2 

Data 

1 

3 

Signal 

2.50±0.60l(J;g 

2.8±l.ll°;i 


Table 3: Expected and observed yields. The first uncertainty is statistical. The second uncertainty is an 
approximation of the total systematic uncertainty, not taking into account the correlations between them. 


Sample 

e/x 


ti 

0.811°;14 

— 

ti + W/Z 

— 

9 00+0.42 

^•^O_o.3s 

Total bkg. 

0.811°;14 

9 00+0.42 

^•^O_o.38 

Data 

1 

3 

Signal 

2 47+0-84 
-0.63 

2 

^•^-0.9 


Table 4: Expected and observed yields. The uncertainty is a combination of the statistical and systematic 
uncertainties, taking into account the correlations between systematics. 


Uncertainty 

tt 

Signal 

JES 

-5.00 
+ 12.00 

+21.00 

-13.00 

5-tagging 

±4.00 

— 

TTriiiss 

Hjrp 


— 

norm 

— 

— 

Total 

+ 12.65 
-6.40 

+21.00 

-13.00 


Table 5: List of relative systematic uncertainties (in %) for channel e/i. 


Uncertainty 

tt + W/Z 

Signal 

JES 

— 

+5.00 

-13.00 

5-tagging 

— 

— 

TT'miss 

1 Jrjn 

+ 1.00 
+ 1.00 

— 

norm 

— 

-2.00 

-9.00 

Total 

+ 1.00 
+0.00 

+5.00 

-15.81 


Table 6: List of relative systematic uncertainties (in %) for channel /i/r. 
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4.3.2 Multi-bin experiment 


Two syntaxes can be used. Both are similar to that for counting experiments, except that ROOT files 
containing histograms are provided instead of numbers. Only differences with respect to the syntax 
described in Sec. 4.3.1 are therefore described. 

Syntax 1: This syntax can be used only if all ROOT files are in the same directory and all histograms 
inside them have the same name. The general structure of the files is shown in Fig. 7. An example of 
such file is "inputHistos.dat" in the examples sub-directory. 

+setup 

■directory <dir> 

■histoName <hname> 

+sig <name> <root-file> 


. syst 

<ncmie> 

<root-file 

up> 

<root-file 

down> 

. syst 

<ncmie> 

<root-file 

up> 

<root-file 

down> 

+bg <name> <root-file> 




. syst 

<name> 

<root-file 

up> 

<root-file 

down> 

. syst 

<name> 

<root-file 

up> 

<root-file 

down> 

+bg <name> <root-file> 




. syst 

<ncmie> 

<root-file 

up> 

<root-file 

down> 

. syst 

<ncmie> 

<root-file 

up> 

<root-file 

down> 


# . . . 

# add as many backgrounds as needed 

# . . . 

+data <root file> 

Figure 7: General structure of OpTHyLiC’s input files for multi-bin experiments using syntax 1. 

The block starting with the +setup tag defines the fields that are common to all samples declared in 
the file (signal, backgrounds and data). Field <dir> defines the directory containing all the ROOT files. 
Field <hname> defines the name of the histograms inside the ROOT files. Fields <root-f ile> define the 
names of the ROOT files located in the <dir> directory containing the nominal histograms. Bin errors 
are used for absolute statistical uncertainties. Users must therefore make sure they are set properly, in 
particular if histograms are weighted. Fields <root-file up> and <root-file down> define the names 
of the ROOT files located in the <dir> directory containing the varied histograms when systematic 
uncertainties are shifted by -tier and —Icr respectively. These histograms must contain yields in each 
bin (that is, as opposed to the counting experiment case, yields after systematic variation are provided 
rather than relative variations). In addition, bin errors for systematic histograms are not used and can 
be set to 0 or any other value. 

Syntax 2: This syntax can be used in all cases, even if all ROOT files aren’t in the same directory and 
all histograms inside them don’t have the same name. The general structure of the files is shown in Fig. 8. 
An example of such file is "inputHistosAlternativeSyntax.dat" in the examples sub-directory. 

Fields <root-file>, <root-file up> and <root-file down> define the full path names of the 
ROOT files containing the nominal and varied histograms. Fields <hname> define the names of the 
histograms inside the ROOT files. They must be between parentheses with no blanck space between the 
left parenthesis and the ROOT file name. 
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+sig <ncmie> <root-file> (<hnaLme>) 

. syst <name> <root-file up> (<lincmie>) 
. syst <name> <root-file up> (<linEmie>) 


<root-file down>(<hname>) 
<root-file down>(<hname>) 


+bg <name> <root-file>(<hncmie>) 

.syst <name> <root-file up>(<hnEmie>) 
.syst <name> <root-file up>(<hnEmie>) 


<root-file down>(<hname>) 
<root-file down>(<hname>) 


+bg <name> <root-file>(<hname>) 

.syst <name> <root-file up>(<hnEmie>) 
.syst <name> <root-file up>(<hnEmie>) 


<root-file down>(<hname>) 
<root-file down>(<hname>) 


# . . . 

# add as many backgrounds as needed 

# . . . 


+data <root file>(<hnEmie>) 

Figure 8: General structure of OpTHyLiC’s input files for multi-bin experiments using syntax 2. 


4.4 Running the software 

4.4.1 Configuration 

In order to use the software, users need to load the OpTHyLiC library, instantiate an object of type 
OpTHyLiC and add channels. This can be done interactively in a ROOT session as follows: 

gSystem->Load("OpTHyLiC_C"); 

OpTHyLiC oth(0TH::SystMclimit,OTH::StatNormal,OTH::TR3,0,0TH::CombAutomatic); 
oth.addChannelCchannel 1 name",filel); 
oth.addChannelCchannel 2 name",file2); 


The constructor requires at least two parameters. The first one is the interpolation/extrapolation 
method: the possible values are OTH: : SystLinear, OTH: : SystExpo, OTH: :SystPolyexpo, and 

OTH: : SystMclimit, corresponding to the four possibilities described in Sec. 3.3. The second one is the 
type of constraint for statistical uncertainties: the possible values are OTH: :StatNormal, OTH: :StatLogN, 
OTH: : StatGammaHyper, OTH: : StatGammaUni, and OTH: : StatGammaJeff reys, corresponding to the five 
possibilities described in Sec. 3.2. 

In addition, three optional parameters can be provided. The first one is the type of pseudo-random 
number engine: the possible values are OTH: :TR3 (used by default), and, if C-I--T11 features are avail¬ 
able, OTH: ;STD_ followed by the name of one of the nine engines implemented in the C-T-Til stan¬ 
dard library (mtl9937, mtl9937_64, minstd_rand, minstd_randO, ranlux24_base, ranlux48_base, 
rELnlux24, rcLnlux48, knuth_b). The second one is the seed of this engine; the default value 0 has 
the effect of setting a randomly generated seed. The third one is the chosen solution for the combina¬ 
tion of systematic uncertainties, as described in Sec. 3.3: the possible values are OTH: :CombAdditive, 
OTH: :CombMultiplicative, or OTH: :CombAutomatic (used by default), the latter requesting additive 
(multiplicative) combination when the OTH: :SystLinear or OTH: :SystMclimit (OTH: :SystExpo or 
OTH: :SystPolyexpo) options are used for the interpolation/extrapolation method. 

The addChannel member function takes three parameters. The first two ones are of std: : string 
type and correspond to the name of the channel and the name of the input file (see Sec. 4.3) respectively. 
The third one, used only in the case of multi-bin experiments, is a boolean set by default to true. In the 
case of multi-bin experiments, OpTHyLiC converts the input file and ROOT files provided by the user 
into multiple counting experiment files (written using the syntax described in Sec. 4.3.1). More precisely, 
OpTHyLiC creates one counting experiment file per bin of discriminating variable (only bins containing 
either a non-zero yield or a non-zero statistical uncertainty for the signal and total background are 
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considered). If the third parameter is set to false, counting experiment files created by OpTHyLiC are 
dumped on disk, in the same directory as the input file. Their names are of the form f ileName_binZ. ext, 
where f ileName is the name of the input file, ext its extension and I is the bin index. These files can be 
useful for debugging or for running OpTHyLiC over a subset of bins. 

As mentioned in Sec. 3.1, for multi-bin experiments the histogram bins are considered independent 
(as channels in counting experiments). Therefore, in the rest of this paper the term ’’channel” refers 
in such case to a bin of the discriminating variable distribution in one channel, reflecting the software 
structure. 

4.4.2 LaTeX tables production 

LaTeX tables are produced as follows: 

ofstream ofs(filenamie); 

int precision = -1; 

int NpseudoExps = 1000000; 

oth.createInputYieldTable(ofs.precision); 

oth.createGeneratedYieldTable(ofs.precision,NpseudoExps); 

oth.createSysteTables(ofs,"systDict.txt".precision); 

These three methods take as first parameter an output stream, which can be the standard output 
(std: : cout). 

The functions createInputYieldTable and createGeneratedYieldTable produce tables of expected 
and observed yields. In the first case, the total uncertainty for each process is evaluated by summing in 
quadrature all uncertainties without taking into account the correlations, while in the second case the 
total uncertainty is assessed by using pseudo-experiments (the number of pseudo-experiments is given as 
the facultative third parameter). An example of such tables are shown in Tab. 3 and 4. The precision 
of the numbers in these tables can be adjusted to a fixed value (given as the second parameter) or to be 
automatically adjusted depending on the size of the uncertainties (when the precision parameter is set 
to - 1 , which is the default value). 

The function createSysteTables creates one table for each channel which summarises all the sys¬ 
tematic uncertainties. Examples of such tables are shown in Tab. 5 and 6 . The second parameter of 
createSysteTables is a string specifying the dictionnary hie. An example of such hie is given in Fig. 6 . 

4.4.3 Observed limit computation 

Observed limits are computed as follows: 
double els; 

double limit=oth.sigStrengthExclusion(OTH::LimDbserved,nbExp,cls); 

The hrst parameter of function sigStrengthExclusion dehnes the limit type (here observed). Other 
types can be used for expected limits (see Sec. 4.4.4). The second parameter (nbExp) is the number of 
pseudo-experiments and the third parameter the hnal CLg value (corresponding to the exclusion). An 
optional fourth parameter may be provided and corresponds to a hint of the observed limit. This value 
does not affect the result of the computation but only its speed: giving a hint that is rather close to the 
hnal value would most probably speed up the process. Finally, an optional hfth parameter may be used 
to determine which method to use for the computation. The default value (OTH: :MethDichotomy) is the 
most accurate one. The other possibility is to use an extrapolation method (OTH: :MethExtrapol) that 
is faster but sometimes less accurate. 

By default, the computed limits are for a 95% conhdence level. It is possible to modify this conhdence 
level as follows: 

oth.setConfLevel(0.9); 

for example for a 90% conhdence level. 
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4.4.4 Expected limits computation 

Two methods are provided to compute expected limits as described in Sec. 3.5. 

With the main method, a single expected (median, —2a, — Icr, +la or +2a) limit is com¬ 
puted as the observed one (see Sec. 4.4.3) changing the first parameter to OTH: : LimExpectedM2sig, 
OTH: :LimExpectedMlsig, OTH: : LimExpectedMed, OTH: :LimExpectedPlsig or OTH: :LimExpectedP2sig 
depending on the limit type. For example, the expected median limit is computed using: 

double limit=oth.sigStrengthExclusion(OTH::LimExpectedMed,nbExp,cls); 

Like in Sec. 4.4.3, it is possible to provide a hint of the expected limit and to use the extrapolation 
method when speed is preferred to accuracy. 

With the alternative method, all expected (median, —2cr, —la, -\-la and +2a) limits are computed 
at the same time using: 

oth.expectedSigStrengthExclusion(nbMu,nbExp); 

where nbMu and nbExp are the number of entries in the distribution (see Sec. 3.5) and the number of 
pseudo-experiments, respectively. 

4.4.5 Scan of CLg as a function of the signal strength 

It is possible to perform a scan of the CLg value for several values of the signal strength, using: 

oth.scanCLsVsMu(min,max,steps,nbExp,OTH::LimExpectedMed); 
oth.getCLsVsMu()->Draw("alp"); 

The first three parameters define the signal strength range to be scanned, steps being the number of 
values to be scanned. The parameter nbExp is the number of pseudo-experiments to compute each CLg. 
The last parameter defines the type of observed value to be used, within the list OTH : : LimExpectedM2sig, 
OTH: :LimExpectedMlsig, OTH: : LimExpectedMed, OTH: :LimExpectedPlsig or OTH: :LimExpectedP2sig 
and OTH: : LimObserved. 

By fitting the returned graph, the user may compute the corresponding limit, alternatively to the 
methods presented in Sec. 4.4.3 and 4.4.4. 

4.4.6 Significance computation 

In addition to observed and expected upper limits, OpTHyLiC can be used to compute the signihcance 
of an observation using: 

std::pair<double, double> r = oth.significatnce(type,nbExp,mu); 

where type is the type of significance, nbExp the number of pseudo-experiments and 
mu the signal strength (set by default to 1 if not specified). The possible values for 
type are OTH::SignifObserved, OTH::SignifExpectedP2sig, OTH::SignifExpectedPlsig, 
OTH::SignifExpectedMed, OTH::SignifExpectedMlsig, OTH: : SignifExpectedM2sig. If 

OTH: :SignifObserved is used, the observed significance is computed. Otherwise, expected signif¬ 
icances under the signal plus background hypothesis are computed. 

The significance method returns a std: :pair<double, double>. The first element of the pair 
corresponds to the p-value given by: 

<iT 

p-value = E ^^(9.10) (33) 

9m =0 

and the second one to the significance s in units of normal distribution standard deviation. It is given 
by: 

/■“ 1 

p-value = / ,_ e ^ da; ( 34 ) 

Js 
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4.4.7 Access to histograms 

During the limit computation, several histograms are produced. After the computation, some of these 
histograms are available. 

The final distributions can be accessed using the following commands: 

oth.getHistoLLRb()->Draw(); 

oth.getHistoLLRsb()->Draw("sernie") ; 

In the case of the alternative expected limit computation, the distributions are not accessible but two 
other histograms are available: 

oth.getDistrExpMu()->Draw(); 
oth.getDistrCLs()->Draw(); 

where the first one is the expected distribution of /iup from which the expected limits are derived and 
the last one is the full distribution of computed CLs (that should peak at 1-confidence level). 

Other interesting distributions are channel dependent, therefore the user must first access a specific 
channel. There are two ways to do so: 

• oth.getChcUinelC"name") where name is the name of the channel, 

• oth.getChannelCindex) where index is the index of the channel returned by the addChauinel 
function. 

In the case of multi-bin experiment, "name" should be of the form channelNcmie_bin^, where channelName 
is the name of the channel specified by the user (first parameter of the addChannel method) and I is the 
bin index. The final g^ distributions for a single channel can be accessed using the following commands: 

oth.getChcUinel(index)->getHistoLLRb()->Draw(); 

oth.getChcUinel(index)->getHistoLLRsb()->Draw("sernie"); 

as well as the distributions of number of events: 

oth.getChcUinel(index)->getHisto(OTH::Channel::hDistrBg)->Draw(); 
oth.getChcUinel(index)->getHisto(OTH::Channel::hDistrSB)->Draw("same"); 

Other interesting histograms are the distributions of the systematic uncertainties, for example: 

oth.getChannel(index)->getSigSystDistr("Syst1")->Draw(); 

oth.getChannel(index)->getBkgSystDistr("Bkg","Syst2")->Draw(); 

where the last parameter is the name of the systematic uncertainty and the first parameter of the 
getBkgSystDistr function is the name of the background. 

4.4.8 Single channel computations 

To compute the limits for a single channel, not combining with the others, the following function is 
available: 

double limit=oth.getChannel(index)->sigStrengthExclusion(OTH::LimObserved,nbExp,cls); 

After this call, channel dependent histograms (as described in the previous section) are available. 

For the alternative expected limit computation, the following function may be invoked: 

double limit=oth.getChannel(index)->expectedSigStrengthExclusion(nbMu,nbExp); 

In this case, all previous histograms are available as well as a few more: 

oth.getChcUinel(index)->getDistrExpMu()->Draw(); 
oth.getChcUinel(index)->getExpMuVsObs()->Draw("alp"); 
oth.getChcUinel(index)->getDistrCLs()->Draw(); 
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where the second one is the expected Hup as a function of the number of observed events. 

It is also possible to study the distribution of the yields after the statistical and systematic variations. 
In order to generate these distributions, the following function must be called: 

oth.getChcUinel(index)->generateDistrYield(nbExp); 

then, the generated distributions are available: 

oth.getChcUinel(index)->getSigYieldDistr()->Draw(); 

oth.getChcUinel(index)->getBkgYieldDistr("Bkg")->Draw(); 

oth.getChannel(index)->getHisto(OTH::Channel::hYieldBg)->Draw(); 

where the hrst function gives the distribution for the signal, the second one for a single background 
sample, and the third one is the distribution for the total background. These distributions are also 
available after a call to createGeneratedYieldTable. 

4.4.9 Test statistic distribution and derived quantities 

The test statistic distributions under background and signal plus background hypotheses for a desired 
value of the signal strength /i can be computed as follows: 

oth.setSigStrength(mu); 
oth.generateDistrLLR(nbExp); 

where mu is the desired /i value and nbExp is the number of pseudo-experiments. Once this is done, distri¬ 
butions can be accessed as described in Sec. 4.4.7. The p-value of the observation under the background 
hypothesis and the CLg can be computed as follows: 

double pvalue=oth.pValueData(); 
double cls=oth.computeCLsDataO; 

The p-value under the background hypothesis and the CLg for a single channel can be computed 
using: 

double pvalue=oth.getChcUinel(index)->pValue(nObs); 
double cls=oth.getChannel(index)->computeCLs(nObs); 

where nObs is the observed yield. For a single channel, the user can also compute the excluded yield for 
the chosen /i value using: 

int yield=oth.getChannel(index)->findObsExclusion(); 


5 Software validation 

The computation of upper limits and observation significances with OpTHyLiC involves many calcu¬ 
lations of different types: generation of random numbers, calculation of p-values {CLs+b and CLb), 
scanning over /x values to solve Eq. 1, combination of channels, interpolation and extrapolation of sys¬ 
tematic uncertainties, marginalization of statistical and systematic uncertainties, calculation of quantiles 
for expected limits, etc. Several studies have been performed in order to validate these calculations. 
OpTHyLiC is compared either to a theoretical solution in situations where such a solution exists, to 
bayesian calculations in situations where they are equivalent to the hybrid CLg under interest or to results 
obtained with the McLimit software which is equivalent to OpTHyLiC when specihc choices for the 
interpolation/extrapolation of systematics and constraints for statistical uncertainties are made. These 
studies are summarized in the following sections. 
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5.1 Validation of calculations without uncertainties 

OpTHyLiC has first been validated in the simplest situation where signal and background yields have 
no uncertainties (neither systematic nor statistical), both in the single and multiple channels cases. 

In the single channel case, an analytical solution for /iup exists. Indeed, CLg^f, and CLf, are given by 


N° 


CL 


s-\-b 


= E 


ifis 




N=0 


m 


- e~ 


(/iS 


“+h" 


“) = 1 - F^2 (2 ; 2 + l)) 


and 

= E jyi = ^-Fx^ 2 + l)) 

N=0 

where F ^2 (a;; d) is the cumulative distribution function of the chi-squared distribution with d degrees of 
freedom at x. Eq. 1 then yields 


/^up — 


0.5 X F-^^ (1 - a [1 - F^2 (26‘““; 2 + l))] ; 2 + l)) - 

^nom 


(35) 


Fig. 9 shows a comparison between this analytical result and OpTHyLiC for = 0.82 x L, 
gnom _ 2 49 X L and = 1 x L, with L = 1,..., 7. Excellent agreement is found. Other tests have 
been performed with other values of and always leading to the same conclusion. 
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Figure 9: Upper limit as a function of L computed with OpTHyLiC and from the analytical result 
(Eq. 35) for 6'^°“ = 0.82 x L, = 2.49 x L and = 1 x L. 
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In the multiple channels case, two validations were made. In the first one, upper limits calculated with 
several channels were compared to upper limits calculated with a single channel in situations where the 
two are expected to give identical results. Such situations occur when yields in the various channels are 
related to each other by a simple multiplicative factor. For example, the same limits should be obtained 
in these two cases: 

• n channels each with 

— background yield=&"°“/n 

— signal yield=s''°™/n 

— observed yield=A^°'^®/n 

• a single channel with 

— background yield=&"°™ 
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— signal yield=s''°™ 

- observed yield=A^°'^® 


It has been checked for several values of and n that OpTHyLiC indeed finds the 

same limits in both cases. In the second one, the general case where yields in the various channels can’t 
be related to each other by a simple multiplicative factor has been considered. It can be seen from Eq. 
28 and Eq. 29 that the upper limit can be computed using the following test 


fVeff = ^ N,I3, 

c 


with 


Pc = In 


Snorri 


In the asymptotic limit, iV^ is normally distributed. Thus 


A^eff ~ AA I E + bTn , Pi + bTn 


/ .f'c 


under the signal plus background hypothesis and 


NeS ~ PP E Pcb 




under the background only hypothesis (A/” (a, b) is the normal distribution with mean a and variance b). 
CLs is therefore given by 


CL, 


$ 






$ 




E 


(36) 


where $ is the cumulative distribution function of the standard normal distribution. Eq. 1 with Eq. 36 
can easily be solved by dichotomy. This result has been used to validate the channel combination proce¬ 
dure implemented OpTHyLiC in the asymptotic limit. Fig. 10 shows a comparison of this asymptotic 
result with OpTHyLiC in the three channels example defined by 

• channel 1: = 5.18 x L, = 2.22 x L and = 3 x T 

• channel 2: = 3.05 x L, = 1.61 x L and = 4 x L 

• channel 3: = 4.45 x L, = 2.95 x L and = 2x L 


As can be seen from Fig. 10, OpTHyLiC converges, as expected, to the asymptotic result as the 
number of events increases. 


5.2 Validation of calculations with statistical and systematic uncertainties 

Several other studies have been performed in order to validate the treatment of statistical and systematic 
uncertainties in OpTHyLiC. As already explained, OpTHyLiC treats uncertainties in a Bayesian way 
by marginalizing the likelihood. A simple check of the marginalization procedure has been performed in 
the single channel case by comparing marginal distributions of the yield as computed by OpTHyLiC to 
analytical distributions in the case where a single background affected by a statistical uncertainty con¬ 
strained with a gamma p. d. f. contributes to the expected yield. Indeed, it is known that, in this case, 
that the marginal distribution of the yield, given by the compound of a Poisson and a gamma distribution, 
is negative binomial: 

cOO 

P(N = a)= P(N = n\b) x f{b; a)db 

Jo 

_r(iV+(^)')^ bnom . ^2 Y 

~ ^nom _(_ 0-2 y fonom + J 
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Figure 10: Upper limit ^up as a function of L computed with OpTHyLiC and from the asymptotic result 
(Eq. 36) for the three channels example defined in the text. 


where N is the observed yield, b is the background yield, its nominal value, a its statistical un¬ 
certainty, P{N = n\b) the Poisson distribution with parameter b, f {b\b'^°'^, a) the gamma distribution 
for b with mean and standard deviation a (given by Eq. 14c) and P{N = n\b’^°’^,a) the marginal 
(negative binomial) distribution of the yield N. Fig. 11 shows that the agreement between Eq. 37 
and marginal distributions calculated by OpTHyLiC is very good. Excellent agreements have also been 
observed for the two other gamma definitions available in OpTHyLiC (Eq. 14a and 14b). 



Figure 11: Marginal distribution of the yield under the background only hypothesis in the case where 
the background has a statistical uncertainty constrained by a gamma p. d. f. with mean 5'^°“ = 15 and 
three different values of standard deviation: a = 2,7 and 14. 


In order to validate not only the treatment of statistical uncertainties but also that of systematic 
uncertainties, upper limits calculated with OpTHyLiC have been compared to upper limits calculated 
using a bayesian method. It can indeed be shown that, in the single channel case, the hybrid CLg method 
implemented in OpTHyLiC is equivalent to the bayesian method when a uniform prior on the signal 
strength fi is used and when the signal has no uncertainties (neither statistical nor systematic) associated 
to it [9]. This equivalence holds for any number of background sources, any number of statistical and 
systematic uncertainties associated to them and any type of correlation between systematic uncertainties 
across background sources. This result is used to complete the validation of the treatment of statistical 
uncertainties and to validate the treatment of systematic uncertainties. In order to perform this validation, 
an independent bayesian software, based on RooStats, has been developed. This software implements 
exactly the same likelihood as OpTHyLiC and takes as input the same files, hence allowing direct 
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comparison to OpTHyLiC. Several comparisons between OpTHyLiC and bayesian results have been 
performed by changing the number of background sources and the number of systematic and statistical 
uncertainties. One such comparison has been performed using the configuration given in Fig. 12, with 
L = 1,..., 7. The result is shown in Fig. 13. 

+bg Bkgl 25*L 7. 

.syst Systl 0.1 -0.3 
.syst Syst2 0.3 -0.2 

+bg Bkg2 25*L 12 
.syst Systl 0.2 -0.05 
.syst Syst3 -0.06 0.15 

+bg Bkg3 33.33*L 3.5 
.syst Syst2 -0.1 0.3 
.syst Syst3 0.15 -0.15 
.syst Syst4 -0.6 0.6 

+bg Bkg4 16.67*L 5 
.syst Syst5 -0.3 0.25 

+sig Sig 5*L 

+data 90*L 

Figure 12: One of the examples used to validate the treatment of uncertainties in OpTHyLiC. 




Figure 13: Upper limit /lup as a function of L without (left) and with (right) statistical and systematic 
uncertainties. It has been computed using exponential interpolation and extrapolation for systematic 
uncertainties and normal constraint terms for statistical uncertainty. 

This example has been chosen because uncertainties are large and their effect on upper limits is very 
pronounced (as can be seen by comparing plots on the left and right of Fig. 13). Thus, any mis-treatment 
of uncertainties in OpTHyLiC should be visible. Very good agreement between OpTHyLiC and the 
bayesian calculation with uniform prior is found for this example. Similar agreements are found in other 
cases. Rigorously, these comparisons validate only the treatment of statistical and systematic uncertaintes 
for backgrounds. However, signal uncertainties are treated by the same code as background uncertainties. 
Signal uncertainties are therefore expected to be treated properly. As will be seen below, the comparison 
between McLimit and OpTHyLiC gives confidence that signal uncertainties are indeed treated properly. 

The last validation compares observed and expected {-2a, -la, median, +ltT and -\-2a) upper limits 
calculated with OpTHyLiC to upper limits calculated with McLimit. McLimit is, as OpTHyLiC, 
a hybrid frequentist-bayesian tool, using the interpolation/extrapolation described in Sec. 3.3.4 and 
normal constraints for statistical uncertainties. When configured appropriately, OpTHyLiC is therefore 
expected to give the same upper limits as McLimit^. For this comparison, typical inputs from high 

^Profiling of uncertainties has been turned off in McLimit so as to allow direct comparison to OpTHyLiC. 
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energy physics analysis have been used. Fig. 14 shows a comparison of upper limits as a function of 



Figure 14: Comparison of observed and expected upper limits calculated with McLimit and 
OpTHyLiC in a realistic case (see text). 


the mass of a hypothetical new particle obtained by combining three channels. Six mass points have 
been considered and seven background processes contribute to the yield in each channel. Signal and 
background processes are all affected by statistical and systematic uncertainties. For each mass, the total 
number of nuisance parameters is 51 (27 are associated to systematic uncertainties and 24 to statistical 
uncertainties). In both cases, 50 000 pseudo-experiments are used. Limits found with OpTHyLiC are 
in good agreement with those found with McLimit. The main difference between the two softwares is 
the computing time. For example, on the same computer, it took 25 minutes (8 seconds) to calculate the 
observed limit for the 1 TeV point with McLimit (OpTHyLiC). The full plot is produced in less that 
6 minutes with OpTHyLiC while it takes several hours with McLimit. 


6 Conclusion 

A tool computing observed and expected limits has been presented. This tool, named OpTHyLiC, is 
written in C-|—I- and uses the ROOT library. It implements the hybrid frequentist-bayesian CLg method 
for hypothesis testing.lt can be used with an arbitrary number of channels and both for counting and 
multi-bin experiments. Statistical and systematic uncertainties are accounted for as well as correlations 
between systematic uncertainties. Several types of interpolation/extrapolation for systematic uncertain¬ 
ties and constraint terms for statistical uncertainties are provided. OpTHyLiC has been validated by 
comparing it to known analytical and bayesian results and to McLimit in situations where they are 
expected to give identical limits. Very good agreement has been found in all cases, hence validating the 
software. 

One of the main advantage of OpTHyLiC is its speed. Even in realistic cases with dozens of nuisance 
parameters and several channels and bins, limits are typically computed in less than one minute. 
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